Patent Claim Translation based on Sublanguage-specific Sentence Structure
نویسندگان
چکیده
Patent claim sentences, despite their legal importance in patent documents, still pose difficulties for state-of-the-art statistical machine translation (SMT) systems owing to their extreme lengths and their special sentence structure. This paper describes a method for improving the translation quality of claim sentences, by taking into account the features specific to the claim sublanguage. Our method overcomes the issue of special sentence structure, by transferring the sublanguage-specific sentence structure (SSSS) from the source language to the target language, using a set of synchronous context-free grammar rules. Our method also overcomes the issue of extreme lengths by taking the sentence components to be the processing unit for SMT. The results of an experiment demonstrate that our SSSS transfer method, used in conjunction with pre-ordering, significantly improves the translation quality in terms of BLEU scores by five points, in both English-to-Japanese and Japanese-to-English directions. The experiment also shows that the SSSS transfer method significantly improves structural appropriateness in the translated sentences in both translation directions, which is indicated by substantial gains over 30 points in RIBES scores.
منابع مشابه
of MT Summit XV
Patent claim sentences, despite their legal importance in patent documents, still pose difficul-ties for state-of-the-art statistical machine translation (SMT) systems owing to their extremelengths and their special sentence structure. This paper describes a method for improving thetranslation quality of claim sentences, by taking into account the features specific to the claim<...
متن کاملGlobal Pre-ordering for Improving Sublanguage Translation
When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations. This paper proposes a novel global reordering method with particular focus on long-distance reordering for capturing the global sentence structure of a sublanguage. The proposed method learns global reordering models from a non-annotated par...
متن کاملOn Integrating Hybrid And Rule-Based Components For Patent MT With Several Levels Of Output
We present a methodology integrating hybrid and rule-based components for speeding up the development of a patent MT system. The methodology is suitable for highly inflecting languages and described on the example of translating patent claims from Russian into English. Based on different combinations of hybrid and rule-based components the system performs shallow or/and deep parsing and provide...
متن کاملMachine Translation , Ten Years On : Discourse has yet to make a breakthrough
As already mentioned, most machine translation systems perform translation sentence by sentence. But even in the case of paragraph translation, the discourse structure of the target text tends to be identical to that of the source text. However, the sublanguage discourse structures may differ across the different languages, and thus a translated text which assumes the same discourse structure a...
متن کاملEmbedding MT for Generating Patent Claims in English from a Multilingual Interface
In this paper, we present a methodology for the development of interactive domain-tuned patent tools for generating patent claims in English from non-English interfaces. The methodology is based on a merger of an interactive English-to-English patent claim generator, AutoPat1 and any external MT engine that might be appropriate for a certain language. The translation procedure is reduced to tra...
متن کامل